Parse Tree Database for Information Extraction

نویسندگان

Luis Tari

Phan Huy Tu

Jörg Hakenberg

Yi Chen

Tran Cao Son

Graciela Gonzalez

Chitta Baral

چکیده

Information extraction systems are traditionally implemented as a pipeline of special-purpose processing modules targeting the extraction of a particular kind of information. A major drawback of such approach is that whenever a new extraction goal emerges or a module is improved, extraction has to be re-applied from scratch to the entire text corpus even though only a small part of the corpus might be affected. In this paper, we describe a novel approach for information extraction so that extraction needs are expressed in the form of database queries, which are evaluated and optimized by databases. Using database queries for information extraction enables generic extraction and minimizes reprocessing of data. In addition, our approach provides two different query generation components that can automatically form database queries for extraction from training datasets, as well as from unlabeled data through a mechanism inspired by the pseudo-relevance feedback approach found in protein-protein interactions and drug-protein-metabolic relations from two sets of corpus. Experiments show that our approach achieves a precision of 83.6% and recall of 58.6% (F-measure of 64.2%) for the extraction of protein-protein interactions from the BioCreative 2 corpus, while achieving a precision of 85.0% and recall of 26.0% (F-measure of 39.8%) for drug-protein-metabolic relations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Information Retrieval System using Incremental Approach

Information Retrieval Systems [12][19] are traditionally implemented as a pipeline of specialpurpose processing modules targeting the extraction of a particular kind of information. A major drawback of such an approach is that whenever a new extraction goal emerges or a module is improved, extraction has to be reapplied from scratch to the entire text corpus even though only a small part of the...

متن کامل

Composite Kernels For Relation Extraction

The automatic extraction of relations between entities expressed in natural language text is an important problem for IR and text understanding. In this paper we show how different kernels for parse trees can be combined to improve the relation extraction quality. On a public benchmark dataset the combination of a kernel for phrase grammar parse trees and for dependency parse trees outperforms ...

متن کامل

Exploring syntactic structured features over parse trees for relation extraction using kernel methods

Extracting semantic relationships between entities from text documents is challenging in information extraction and important for deep information processing and management. This paper proposes to use the convolution kernel over parse trees together with support vector machines to model syntactic structured information for relation extraction. Compared with linear kernels, tree kernels can effe...

متن کامل

Extracting Causal Knowledge from a Medical Database Using Graphical Patterns

This paper reports the first part of a project that aims to develop a knowledge extraction and knowledge discovery system that extracts causal knowledge from textual databases. In this initial study, we develop a method to identify and extract cause-effect information that is explicitly expressed in medical abstracts in the Medline database. A set of graphical patterns were constructed that ind...

متن کامل

Exploiting Constituent Dependencies for Tree Kernel-Based Semantic Relation Extraction

This paper proposes a new approach to dynamically determine the tree span for tree kernel-based semantic relation extraction. It exploits constituent dependencies to keep the nodes and their head children along the path connecting the two entities, while removing the noisy information from the syntactic parse tree, eventually leading to a dynamic syntactic parse tree. This paper also explores e...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Parse Tree Database for Information Extraction

نویسندگان

چکیده

منابع مشابه

Efficient Information Retrieval System using Incremental Approach

Composite Kernels For Relation Extraction

Exploring syntactic structured features over parse trees for relation extraction using kernel methods

Extracting Causal Knowledge from a Medical Database Using Graphical Patterns

Exploiting Constituent Dependencies for Tree Kernel-Based Semantic Relation Extraction

عنوان ژورنال:

اشتراک گذاری